Semi-parametric and Non-parametric Term Weighting for Information Retrieval

نویسندگان

  • Donald Metzler
  • Hugo Zaragoza
چکیده

Most of the previous research on term weighting for information retrieval has focused on developing specialized parametric term weighting functions. Examples include TF.IDF vector-space formulations, BM25, and language modeling weighting. Each of these term weighting functions takes on a specific parametric form. While these weighting functions have proven to be highly effective, they impose strict constraints on the functional form of the term weights. Such constraints may possibly degrade retrieval effectiveness. In this paper we propose two new classes of term weighting schemes that we call semi-parametric and nonparametric weighting. These weighting schemes make fewer assumptions about the underlying term weights and allow the data to speak for itself. We argue that these robust weighting schemes have the potential to be significantly more effective compared to existing parametric schemes, especially with the growing amount of training data becoming available.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A ‎n‎ew weighting approach to Non-Parametric composite indices compared with principal components analysis‎

Introduction of Human Development Index (HDI) by UNDP in early 1990 followed a surge in use of non-parametric and parametric indices for measurement and comparison of countries performance in development, globalization, competition, well-being and etc. The HDI is a composite index of three indicators. Its components are to reflect three major dimensions of human development: longevity, knowledg...

متن کامل

An Adaptive Context-Based Algorithm for Term Weighting: Application to Single-Word Question Answering

Term weighting systems are of crucial importance in Information Extraction and Information Retrieval applications. Common approaches to term weighting are based either on statistical or on natural language analysis. In this paper, we present a new algorithm that capitalizes from the advantages of both the strategies by adopting a machine learning approach. In the proposed method, the weights ar...

متن کامل

Evaluation Approaches of Value at Risk for Tehran Stock Exchange

The purpose of this study is estimation of daily Value at Risk (VaR) for total index of Tehran Stock Exchange using parametric, nonparametric and semi-parametric approaches. Conditional and unconditional coverage backtesting are used for evaluating the accuracy of calculated VaR and also to compare the performance of mentioned approaches. In most cases, based on backtesting statistics Results, ...

متن کامل

Semi-parametric Quantile Regression for Analysing Continuous Longitudinal Responses

Recently, quantile regression (QR) models are often applied for longitudinal data analysis. When the distribution of responses seems to be skew and asymmetric due to outliers and heavy-tails, QR models may work suitably. In this paper, a semi-parametric quantile regression model is developed for analysing continuous longitudinal responses. The error term's distribution is assumed to be Asymmetr...

متن کامل

Adaptive Context-Based Term (Re)Weighting: An Experiment on Single-Word Question Answering

Term weighting is a crucial task in many Information Retrieval applications. Common approaches are based either on statistical or on natural language analysis. In this paper, we present a new algorithm that capitalizes from the advantages of both the strategies. In the proposed method, the weights are computed by a parametric function, called Context Function, that models the semantic influence...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009